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VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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Linux 

Networx 


•32  node  Linux  cluster 
(64  processors) 

•4.4  TB  Panasas 
storage 

•  Hosted  at  ARL- 
MSRC  -  leverages 
existing  support 
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Provide  real-time  test  data  verification,  analysis  and 
warehousing 

Provide  OLAP  tools  for  test  data  analysis  and  data  mining 


ATC  DC  Proposal 


•Achieve  real-time  data  fusion  to  provide  real-time  analytic  and 
decision  support 


•Establish  parallel  post  processing  capabilities  to  effect  knowledge 
extraction 


•Institute  a  high  performance  data  warehouse 


•Real  time  quality  control  -  utilizing  historic  data  sets 
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ATC  DC  Timeline 

•Oct-2003  -  Proposal  selected 
•4-M ay-2004  -  System  Delivered 

•28-June-2004  -  System  on  network  accepting  connections 

•July-2004  -  System  Testing  Complete 

•Sept-2004  -  Current  data  handling  process  (SunElOK)  ported  to  DC 

•Sept-2004  -  Kerberized  filters  in  place  to  allow  web  access  to  data  warehouse  (ARL- 
PET  Dr.  Walter  Landry) 

•Nov-2004  -  OS  Change  from  RHES  to  SuSE  ES9  -  Slave  node  NFS  issues 

•Dec-2004  -  Processing  apps  running  with  mpiJava 

•Dec-2004  -  Tomcat  running  in  a  JavaParty  environment 

•Nov/Dec-2004  -  Army  Science  Conference  demo  of  Data  Warehouse 

•Feb-2005  -  Processing  apps  running  with  Javaparty 

•April-2005  -  Automated  scripts  to  poll  ATC  concentrator  for  new  data  files 
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ATC  DC  Proposal 

•Achieve  real-time  data  fusion  to  provide  real-time  analytic  and 
decision  support 


•Establish  parallel  post  processing  capabilities  to  effect  knowledge 
extraction 


•Institute  a  high  performance  data  warehouse 


•Real  time  quality  control  -  utilizing  historic  data  sets 
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Real  Time  Data  Fusion 


•  Test  data  collected  via  on-board  instrumentation  -VxWorks 
based  computer.  Each  instrument  produces  a  continuous  time 
history  record  of  up  to  250  parameters,  up  to  lOKHz  ea.  Files 
closed  approx,  every  15-30  minutes.  Single  file  size  from  10KB 
to  100MB.  Test  item  may  have  multiple  instruments  recording 
simultaneously. 

•  Must  move  raw  data  files  from  instrumentation  to  cluster  for 
processing.  Wireless  or  PC-Card  harvesting. 

•  When  raw  data  files  show  up  on  cluster  -  Java  based 
conversion  (raw  to  HDF5)  process  must  fire  automatically. 

•  Report  applications  fire,  creating  reports  (PDF,  Excel  etc.)  on  the 
just  processed  data. 

•  Reports  auto-published  to  web  based  Digital  Library  for 
consumption  by  decision  makers. 

•  HDF5  data  files  registered  in  data  warehouse. 
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Real  Time  Data  Fusion  (cont) 


Flash  Drive 


©CD 


‘while(true)’  bash  script  to  scp  data  files 
from  concentrator  to  DC.  ATC  firewall 
allows  DC  ssh  traffic  into  concentrator. 
SSH  keys  used  to  allow  password-less 
(and  unattended)  ssh  commands. 


]<  ita  files  to  concentrator 

i/products/antflow/ 


10-100Mb/Sec 


Data  File  Concentrator 
(Linux) 


Dat 


/*\C_ 


AntFlow  used  to  start 
processing  scripts 


Aberdeen  Test  Center 
Distributed  Center  Linux 
Cluster  @ARL-MSRC 


^  DM 


'C 

<10  minutes 


Poljjqq^guired,  as  §f|j yiated  file 
transfer  not  achieva^Kyncreases 
latency  and  complexity 


^  y  ow 
<  2  minutes 


UGC  2005 


Real  Time  Data  Fusion  (cont) 


F Concentrator  I 
L _ i 


r 

L 


ATC  DC  (fasig) 


J 


/data/BLOBS/  /usr/people/mreil/ 
NEW  ANT  PROCESSES/ . . . 
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ATC  DC  Proposal 

•Achieve  real-time  data  fusion  to  provide  real-time  analytic  and 
decision  support 


•Establish  parallel  post  processing  capabilities  to  effect  knowledge 
extraction 


•Institute  a  high  performance  data  warehouse 


•Real  time  quality  control  -  utilizing  historic  data  sets 
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Establish  post  processing  capabilities  to 
effect  knowledge  extraction 

•  Raw  data  files  are  converted  to  a  common  format  -  HDF5  chosen. 

(http://hdf.ncsa.uiuc.edu/HDF5) 

•  Existing  library  of  java  classes  and  *nix  scripts  to  convert  raw  data 
files  to  HDF5.  Originally  single  threaded  java  code,  extended  to 
utilize  multiple  java  threads.  Worked  well  on  SMP  machines  (Sun 
El  OK),  but  not  on  distributed  processor/memory  systems  (Linux 
cluster).  Processing  is  easy  to  parallelize.  Each  thread  gets  one 
data  file  to  convert.  Java  classes  used  lots  of  memory  -  object 
oriented  nature  of  code  contributed  to  this  -  each  data  point  was  a 
java  object.  Garbage  collection  times  also  large. 

•  mpiJava  -  thin  java  wrapper  around  MPICH.  Created  java  app  that 
distributed  processing  of  data  files  via  message  passing  (MPI). 
Worked  well,  but  required  knowledge  of  the  MPI  framework  and 
library.  Also  dependent  on  availability  of  MPICH  for  your  OS/disto. 
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Establish  post  processing  capabilities  to 
effect  knowledge  extraction  (cont) 


•  JavaParty  -  http://www.ipd.uka.de/JavaPartv/features.htrnl  -  *allows 
easy  port  of  multi-threaded  Java  programs  to  distributed 
environments  such  as  clusters.  Regular  Java  already  supports 
parallel  applications  with  threads  and  synchronization  mechanisms. 
While  multi-threaded  Java  programs  are  limited  to  a  single  address 
space,  JavaParty  extends  the  capabilities  of  Java  to  distributed 
computing  environments. 


*From  the  JavaParty  Web  Site 
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Multiple  Java  Threads 


public  class  ConvertToHDF5  extends  Thread  { 


ConvertToHDF5  worker  =  new  ConvertToHDF5 


mpiJava 


(I4  JBuilder  X  -  C:/jst/src/mi[/army/atc/vision/enterprise/apps/BlobToEnterpriseHDFA...  _  C  X 

File  Edit  Search  Ftefactor  View  Project  Run  Team  Wizards  Tools  Window 

telp 

ft  -  m  is  - 1  a  0  -  e  ^  ^  e  ®  ^ 

DeleteDataFile 

V  Q»  » 

ITTI  1 0101 

B  ►  ’  fe  *  * 

f  BlobToEnterpriseHDFAp... 

;  r— - - m - —  —  l— 

r  y  ■,  , - 

"TSfl 

MPI . COMM_WORLD . Beast (sBeast,  0,  sBeast. length ,  MPI. OBJECT,  0) ; 
if  (myrank  !=  0)  { 


25  //  ... 

26  //  slaves  extract  these  properties  to  local  vars 

27  //  ... 

} 

if  (myrank  ==  0)  { 

30  //  . . . 

31  ;  //  Send  files  to  slaves  for  processing 

MPI. C0MM_W0RLD. Send (s,  0,  1,  MPI. OBJECT,  i Source,  0); 

33  //  ... 

34  } 

else  {  //  I  am  a  slave 

36  if  ... 

37  //  Get  file  from  master  and  process  it 

38  //  ... 

MPI . C0MM_W0RLD .  Recv (s  ,  0,  1,  MPI. OBJECT,  MPI . ANY_SOURCE , 

40  //  ... 

41  } 

42  I 

43  //  .  .  . 

MPI .Final ize() ; 

45  } 

46  } 

<  IB8888888* 

BlobToEnterpriseHDFApplication2_MPI.java  Insert  6:1  ’  CUA  ’  Q,  ’ 


Source 

Design  Bean  UML  Doc  History  StarTeam 

UGC  2005 


JavaParty 


public  remote  class  HelloJP  { 
public  void  hello ()  { 


II  Print  on  the  console  of  the  virtual  machine  where  the  object  lives 

System. out .println ( "Hello  JavaParty!") ; 


} 


public  static  void  main(Strin 

for  (int  n  =  0;  n  < 

II  Create  a  remote 
HelloJP  w 
II  Remotely  invok 

world. hell 


Each  new  ‘remote’  object 
is  created  on  a  slave 
processor.  User  can 
control  which  processor 
with  the  /**  @/  */ 
construct  in  code 


} 
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JavaParty 


•Uses  ssh  to  spawn  JVMs  on  slave  nodes  of  cluster  (similar  to  MPI) 
•One  JVM  per  slave  processor. 

•Controlled  via  .jp-nodefile  (similar  to  ‘machines’  file  used  with  MPI). 

•Pure  java  implementation  -  no  native  libraries  required. 

•Uses  RMI  to  serialize  java  objects  between  JVMs. 

•High  performance  RMI  engine  supplied  (KaRMI). 

•Possible  to  use  without  ‘breaking’  java  source  code  -  extend 
‘RemoteThread’  class  instead  of  using  ‘remote’  keyword. 

•This  is  the  framework  that  we  are  now  using. 

•Regular  java  -  invoke  application  : 

•java  <classname> 


•JavaParty  -  invoke  application  : 

•jpinvite  <classname> 
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Establish  post  processing  capabilities  to 
effect  knowledge  extraction  (cont) 


Data  Ingestion 


Histogram  Processing 


700 
600 
8  500 
1  400 

—  300 

o 

E  200 
JLPO 
0 


-♦—Single  Processor 


■  Cluster  with  32 
Processors 

-Cluster  with  64 
Processors 
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ATC  DC  Proposal 

•Achieve  real-time  data  fusion  to  provide  real-time  analytic  and 
decision  support 


•Establish  parallel  post  processing  capabilities  to  effect  knowledge 
extraction 


•Institute  a  high  performance  data  warehouse 


•Real  time  quality  control  -  utilizing  historic  data  sets 
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What  is  OLAP? 


•  Online  Analytical  Processing 

•  Software  that  enables  decision  support  via  rapid 
queries  to  large  databases  that  store  corporate 
data  in  multidimensional  hierarchies  ana  views. 


T&E 


Institute  a  high  performance  data  warehouse 

•PostgreSQL  7.4  installed  on  dedicated  filesystem  (500  MB  RAID5 
JBOD)  on  head  node. 

•Java  based  web  application  ported  to  JavaParty.  Allows  data  set 
queries  submitted  by  the  web  app  user  to  be  run  on  all  nodes  of  the 
cluster  in  parallel  (for  aggregate  operations).  Tomcat  started  via 
‘javaparty’  rather  than  the  standard  ‘java’.  This  allows  servlets  to 
create  remote  objects,  which  run  on  the  remote  nodes. 

•Kerberos/SecurelD  authentication  module  written  by  PET  IMT  -  Dr. 
Walter  Landry  @  ARL.  Uses  J2EE  servlet  filter  framework  and 
cookies  to  authenticate  each  HTTP  request. 

•GUI  is  java  applet,  which  runs  in  users  browser.  GUI  presents 
metadata  to  user,  who  selects  filter  settings,  and  applet  then  submits 
SQL  statement  on  users  behalf  to  data  warehouse.  List  of  data  sets 
is  returned  -  user  can  then  request  composite  routines  be  run  on  the 
set  of  data  files  -  these  are  run  on  the  entire  cluster  in  the  JavaParty 
environment. 
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data  warehouse 


ATC  VISION  HPC  Authentication  -  Mozilla  Firefox 


File  Edit  View  Go  Bookmarks  Tools  Help 


https://fasig.arl.hpc.mil/cas/login7servi  O  v  @  Go  [Gj, 


•Filter  re 
HTTP  ri 
the  prop< 
page. 

•User  sull 
filter  per 
encrypt* 

•Each  si 
goes  thn 
contain* 
for  this  t( 
request 
login  pa* 


U  Enterprise:  Developmental  Test  Com...  [  Ll  ATC  VISION  HPC  Authentication 


Institute  a  high  performance  data  warehouse 


User’s  Desktop 


VISION  Digital  Library 
https://vdls.atc.army.mil 


Web 
Services 


SSH 


ATC  DC  (fasig.arl.hpc.mil) 


Javaparty  Environment 


Apache  Tomcat 


Authentication 

Filter 


SQL  Executor 
Servlet 


HDF5  Data 
File  Accesor 
Servlet 


Concentrator 


Convert/Report/Load  Java 
Applications 
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Screenshots  Of  OLAP  GUI 
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#  VISION  EU  Database  -  46303  - 

Test  Item  1 

L  3X 

File 


Select  Filters  To  Be  Applied  To  List  Of  All  Data  Files,  And  Then  Click  Next  Button 


Select  Minimums 


File  Duration  Minutes  |o 


Your  Selected  Filters 


Test  Item  1 


Inventory  Item  is  'I 
BHD  COURSE  intllE  is 
AHD  COURSE  CORD  is  Wet ' 

OR  'Danf)' 

3UID  VEHICLE  CONFIG  is  Loaded 
Boulders ' 

OR  ' Loaded+Gr avel ' 


Select  Static  Metadata  Filter 


Comments  Inventory  Item 


Test  Location 


Instrumentation 


Test  Item  1 
Test  Item  2 


Clear  Inventory  Item 


Select  Dynamic  Metadata  Filter 


VERSION  NUMBER 


T  estingConfiguration 
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TOWING  CONFIG 
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DRIVER  NAME 
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Clear  COUFtSE  NAME 


Operations  On  Data  Files  That  Match  Filters 
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□  Use  Calendar  In  Filter  Scroll  Bars  For  Calendar 
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,'i  VISION  EU  Database  -  46303 


Test  Item  1 


File 


Select  Filters  To  Be  Applied  To  List  Of  All  Data  Files,  And  Then  Click  Next  Button 
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#  VISION  EU  Database  - 


Test  Item  1 


□H® 


Select  One  Or  More  Data  Files  And  One  Of  The  Functions  Selected  1  of  216 


res uuid  Asset  Test  Item 

Test  Ctr.  Id 

Instr. 

Loaded  Int... 

File  Start  Time 

File  End  Time 

DRIVER N... 

COURSE NAME  * 

i 

7C1 24401 03 

K2 
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2005-01-12  1 
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Test  Item  1 
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Mon  Mar  08  09:53:33  EST  2004  to  Mon  Mar  08  10:23:35  EST  2004  '59145333030820044D5400405306B28C' 
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Institute  a  high  performance  data  warehouse 

Over  80  projects  using  Data  Warehouse 


co 
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Data  Warehouse  Volume  By  Project  -  March  4,  2005 
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ATC  DC  Proposal 

•Achieve  real-time  data  fusion  to  provide  real-time  analytic  and 
decision  support 


•Establish  parallel  post  processing  capabilities  to  effect  knowledge 
extraction 


•Institute  a  high  performance  data  warehouse 


•Real  time  quality  control  -  utilizing  historic  data  sets 
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Real  time  quality  control  -  utilizing  historic 

data  sets 


•New  data  sets  compared  with  warehoused  data  from  the  same 
channel/test  item  for  anomaly  detection. 

•Future  Work 
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Summary 

•  Parallel  java  applications  are  running  very  well  on 
cluster. 

•  Polling  vs.  interrupt  (event)  driven  processing  not  ideal  - 
but  workable. 

•  ARL  MSRC  administering  the  system  is  ideal. 

•  Data  warehouse  access  requiring  kerberos/securelD 
does  not  fit  well  with  our  current  Digital  Library  project 
based  authentication.  ATC  customers  must  obtain 
HPCMP  account  in  order  to  use  data  warehouse  (they 
don’t  even  know  they  are  using  HPCMP  assets). 

•  Special  thanks  to  Tom  Kendall,  Chris  Slaughter  and 
Ryan  Baxter  at  ARL-MSRC  for  assistance  every  step  of 
the  wav! 


UGC  2005 


Partnering  For  Success 
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